In this tutorial, you can run AtacAnnoR step by step to see how AtacAnnoR works and modify parameters to annotate scATAC-seq cells better.
library(AtacAnnoR)
library(Seurat)
## Attaching SeuratObject
setwd('~/txmdata/scATAC_seq/AtacAnnoR')
SeuratObj_RNA <- readRDS('./data/10X-Multiome/SeuratObj/SeuratObj_RNA.RDS') # reference
SeuratObj_ATAC <- readRDS('./data/10X-Multiome/SeuratObj/SeuratObj_ATAC.RDS') # query
Preprocessing of scATAC-seq data in AtacAnnoR includes two steps:
Gene activity was previously calculated by
Signac::Geneactivity() and saved in ‘ACTIVITY’ assay of the
Seurat object SeuratObj_ATAC, for details about calculating
gene activity, see https://stuartlab.org/signac/reference/geneactivity. You
can also calculate gene activity matrix using other models, like ArchR,
Cicero,
MAESTRO,
etc.
Meta-program matrix can be calculated using
get_nmf_embedding() function in AtacAnnoR, and the default
factor number is set to 50.
query_nmf_embedding <- get_nmf_embedding(peak_counts = SeuratObj_ATAC[['ACTIVITY']]@counts,n_factors = 50)
## Performing TF-IDF normalization...
## Performing NMF...
We next keep the common features of reference and query.
pre_processing_mtxs <- pre_processing(ref_mtx = SeuratObj_RNA[['RNA']]@counts,
query_mtx = SeuratObj_ATAC[['ACTIVITY']]@counts,
verbose = T)
## Keep 16523 intersect genes
ref_mtx <- pre_processing_mtxs$ref_mtx
query_mtx <- pre_processing_mtxs$query_mtx
In the first round annotation, AtacAnnoR performs a gene-level annotation and select a fraction of cells as seed cell candidates.
In addition to the conventional marker genes (referred to as global marker genes here), we also identify the neighbor marker genes for a reference cell type that distinguish it from closely related adjacent cell types.
global_markers <- get_global_markers_sc(sc_counts_mtx = ref_mtx,
labels = SeuratObj_RNA$true,
max_marker = 200)
neighbor_celltypes <- get_neighbor_celltypes(sc_count_mtx = ref_mtx,
labels = SeuratObj_RNA$true,
global_markers,
min.cor = 0.6)
neighbor_markers <- get_neighbor_markers_sc(sc_counts_mtx = ref_mtx,
labels = SeuratObj_RNA$true,
neighbor_celltypes = neighbor_celltypes,
global_markers = global_markers)
We can assess the quality of the global marker genes by function
plot_global_markers_heatmap()
plot_ref_global_markers_heatmap(ref_mtx = ref_mtx,
ref_labels = SeuratObj_RNA$true,
global_markers = global_markers,
neighbor_celltypes = neighbor_celltypes)
As shown in the heatmap, neighbor markers sometimes cannot
distinguish cell subtypes, such as the B cell subtypes “Naive B”,
“Intermediate B” and “Memory B”. So function
plot_neighbor_markers_heatmap() helps to evaluate the
efficacy of these markers in distinguishing neighboring cell types.
plot_ref_neighbor_markers_heatmap(ref_mtx = ref_mtx,
ref_labels = SeuratObj_RNA$true,
neighbor_markers = neighbor_markers,
neighbor_celltypes = neighbor_celltypes,
celltype_to_plot = 'Memory B')
This is done by calculating the Kendall’s tau between each query
cell’s gene activity profile and each reference cell type’s gene
expression profile, and the candidate cell label is determined by
selecting the most similar reference cell type. The predicted labels are
saved in cell_meta$kendall_pred.
cor_mtx <- get_cor_mtx(sc_count_mtx = ref_mtx,
labels = SeuratObj_RNA$true,
query_mtx = query_mtx,
global_markers = global_markers,
query_nmf_embedding = query_nmf_embedding)
## Processing reference...
## Processing query...
## Intersect genes number: 781
## Computing Kendall correlation coefficient using 10 threads...
cell_meta <- get_kendall_pred(cor_mtx)
This step is done by testing whether the marker genes are of higher activity than the background genes using statistical test, which result in two scores: global marker significant score(GMSS) and neighbor marker significant score(NMSS). Then, the seed cell candidates are selected by choosing those cells with high scores using Gaussian Mixture Model (GMM).
cell_meta <- test_markers(query_mtx,cell_meta,global_markers,neighbor_markers,threads = 10,verbose = T)
## Start testing markers using 10 threads...
cell_meta <- get_seed_candidates(cell_meta)
At the end of the first round annotation, seed cell candidates are
selected according to the gene-level information of ATAC cells, we can
visualize the distribution of these cells on UMAP by function
plot_pred_umap() or check their NMF meta-program profile
quality by function plot_pred_nmf().
plot_pred_umap(Seurat.object = SeuratObj_ATAC,
cell_meta = cell_meta,
category = 'seed_candidate')
## Loading required package: Signac
plot_pred_nmf(query_nmf_embedding = query_nmf_embedding,
cell_meta = cell_meta,
neighbor_celltypes = neighbor_celltypes,
category = 'seed_candidate')
In the second round annotation, AtacAnnoR performs a genome-wide-level annotation of all query scATAC-seq cells using the seed cells from query itself to avoid batch effect.
Seed cell cleaning is done by applying WKNN algorithm to the seed cell candidates themselves, and discarding those with inconsistent labels.
cell_meta <- seed_cleaning(cell_meta,query_nmf_embedding)
Also, the UMAP distribution and the NMF meta-program heatmap of the
seed cells can be visualized by function plot_pred_umap()
and plot_pred_nmf().
plot_pred_umap(Seurat.object = SeuratObj_ATAC,
cell_meta = cell_meta,
category = 'seed')
plot_pred_nmf(query_nmf_embedding = query_nmf_embedding,
cell_meta = cell_meta,
neighbor_celltypes = neighbor_celltypes,
category = 'seed')
We can also check the heatmap illustrating the activity of the
corresponding reference cell type-specific global/neighbor marker genes
in query seed cells by function
plot_seed_global_markers_heatmap() and
plot_seed_neighbor_markers_heatmap() respectively.
plot_seed_global_markers_heatmap(query_mtx = query_mtx,
cell_meta = cell_meta,
global_markers = global_markers,
neighbor_celltypes = neighbor_celltypes)
plot_seed_neighbor_markers_heatmap(query_mtx = query_mtx,
cell_meta = cell_meta,
neighbor_markers = neighbor_markers,
neighbor_celltypes = neighbor_celltypes,
celltype_to_plot = 'Memory B')
Final prediction is done by applying WKNN algorithm again to predicted the left unlabeled cells using seed cells’ labels.
cell_meta <- WKNN_predict(cell_meta,query_nmf_embedding)
In the same way, the UMAP distribution and the NMF meta-program
heatmap of the final predictions can be visualized by function
plot_pred_umap() and plot_pred_nmf().
plot_pred_umap(Seurat.object = SeuratObj_ATAC,
cell_meta = cell_meta,
category = 'final')
plot_pred_nmf(query_nmf_embedding = query_nmf_embedding,
cell_meta = cell_meta,
neighbor_celltypes = neighbor_celltypes,
category = 'final')
Finally, the cell type proportion of the reference, seed cell
candidates, seed cells, and final predictions can be visualized by using
function plot_celltype_proportions().
plot_celltype_proportions(cell_meta,ref_labels = SeuratObj_RNA$true)
sessionInfo()
## R version 4.2.3 (2023-03-15)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] grid stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] Signac_1.9.0 ggplot2_3.4.2 ComplexHeatmap_2.14.0
## [4] RcppML_0.5.6 SeuratObject_4.1.3 Seurat_4.3.0
## [7] AtacAnnoR_0.3.0
##
## loaded via a namespace (and not attached):
## [1] circlize_0.4.15 spam_2.9-1 fastmatch_1.1-3
## [4] plyr_1.8.8 igraph_1.4.2 lazyeval_0.2.2
## [7] sp_1.6-0 splines_4.2.3 BiocParallel_1.32.6
## [10] listenv_0.9.0 scattermore_0.8 GenomeInfoDb_1.35.16
## [13] digest_0.6.31 foreach_1.5.2 htmltools_0.5.5
## [16] fansi_1.0.4 magrittr_2.0.3 tensor_1.5
## [19] cluster_2.1.4 doParallel_1.0.17 ROCR_1.0-11
## [22] Biostrings_2.66.0 globals_0.16.2 matrixStats_0.63.0
## [25] spatstat.sparse_3.0-1 colorspace_2.1-0 ggrepel_0.9.3
## [28] xfun_0.38 dplyr_1.1.1 RCurl_1.98-1.12
## [31] crayon_1.5.2 jsonlite_1.8.4 progressr_0.13.0
## [34] spatstat.data_3.0-1 survival_3.5-5 zoo_1.8-11
## [37] iterators_1.0.14 glue_1.6.2 polyclip_1.10-4
## [40] gtable_0.3.3 zlibbioc_1.44.0 XVector_0.38.0
## [43] leiden_0.4.3 GetoptLong_1.0.5 shape_1.4.6
## [46] future.apply_1.10.0 BiocGenerics_0.44.0 abind_1.4-5
## [49] scales_1.2.1 mvtnorm_1.1-3 spatstat.random_3.1-4
## [52] miniUI_0.1.1.1 Rcpp_1.0.10 viridisLite_0.4.1
## [55] xtable_1.8-4 clue_0.3-63 reticulate_1.28
## [58] dotCall64_1.0-2 stats4_4.2.3 htmlwidgets_1.6.2
## [61] httr_1.4.5 RColorBrewer_1.1-3 ellipsis_0.3.2
## [64] ica_1.0-3 pkgconfig_2.0.3 farver_2.1.1
## [67] sass_0.4.5 uwot_0.1.14 deldir_1.0-6
## [70] utf8_1.2.3 tidyselect_1.2.0 labeling_0.4.2
## [73] rlang_1.1.0 reshape2_1.4.4 later_1.3.0
## [76] munsell_0.5.0 pbmcapply_1.5.1 tools_4.2.3
## [79] cachem_1.0.7 cli_3.6.1 generics_0.1.3
## [82] ggridges_0.5.4 evaluate_0.20 stringr_1.5.0
## [85] fastmap_1.1.1 yaml_2.3.7 goftest_1.2-3
## [88] knitr_1.42 fitdistrplus_1.1-8 purrr_1.0.1
## [91] RANN_2.6.1 pbapply_1.7-0 future_1.32.0
## [94] nlme_3.1-162 mime_0.12 RcppRoll_0.3.0
## [97] compiler_4.2.3 rstudioapi_0.14 plotly_4.10.1
## [100] png_0.1-8 spatstat.utils_3.0-2 tibble_3.2.1
## [103] bslib_0.4.2 pcaPP_2.0-3 stringi_1.7.12
## [106] highr_0.10 lattice_0.20-45 Matrix_1.5-1
## [109] vctrs_0.6.1 pillar_1.9.0 lifecycle_1.0.3
## [112] spatstat.geom_3.1-0 lmtest_0.9-40 jquerylib_0.1.4
## [115] GlobalOptions_0.1.2 RcppAnnoy_0.0.20 bitops_1.0-7
## [118] data.table_1.14.8 cowplot_1.1.1 irlba_2.3.5.1
## [121] GenomicRanges_1.50.2 httpuv_1.6.9 patchwork_1.1.2
## [124] R6_2.5.1 promises_1.2.0.1 KernSmooth_2.23-20
## [127] gridExtra_2.3 IRanges_2.32.0 parallelly_1.35.0
## [130] codetools_0.2-19 MASS_7.3-58.3 rjson_0.2.21
## [133] withr_2.5.0 Rsamtools_2.14.0 sctransform_0.3.5
## [136] GenomeInfoDbData_1.2.9 S4Vectors_0.36.2 parallel_4.2.3
## [139] tidyr_1.3.0 rmarkdown_2.21 Cairo_1.6-0
## [142] Rtsne_0.16 spatstat.explore_3.1-0 shiny_1.7.4